decision tree learnability
Estimating decision tree learnability with polylogarithmic sample complexity
We show that top-down decision tree learning heuristics (such as ID3, C4.5, and CART) are amenable to highly efficient {\sl learnability estimation}: for monotone target functions, the error of the decision tree hypothesis constructed by these heuristics can be estimated with {\sl polylogarithmically} many labeled examples, exponentially smaller than the number necessary to run these heuristics, and indeed, exponentially smaller than information-theoretic minimum required to learn a good decision tree. This adds to a small but growing list of fundamental learning algorithms that have been shown to be amenable to learnability estimation. En route to this result, we design and analyze sample-efficient {\sl minibatch} versions of top-down decision tree learning heuristics and show that they achieve the same provable guarantees as the full-batch versions. We further give ``active local'' versions of these heuristics: given a test point $x^\star$, we show how the label $T(x^\star)$ of the decision tree hypothesis $T$ can be computed with polylogarithmically many labeled examples, exponentially smaller than the number necessary to learn~$T$.
Review for NeurIPS paper: Estimating decision tree learnability with polylogarithmic sample complexity
Additional Feedback: The paper is not interesting enough for a competitive conference. It is good to have these results in the literature, but I suggest to send it to a journal. Having read the reviews, and following the discussion, I still think that this does not below in a competitive conference. Indeed, as the authors stress in their response, the power of the result is due to the specific algorithm developed here. Nevertheless, I cannot be excited by it, given the monotonicity assumption and the fact that it applies only to the uniform distribution setting. I agree that it's an interesting result, but I think that it's not interesting enough nor important enough for a top conference.
Review for NeurIPS paper: Estimating decision tree learnability with polylogarithmic sample complexity
The submission got four reviews that were quite polarised in their recommendations, with two against accepting and two strongly in favour. The disagreement did not concern the technical quality of the paper. The reviewers agree that the theoretical work in this paper has been very competently performed and in the context of the problem the authors consider, the results are interesting and advance the state of the art. The disagreement is over whether the results are significant enough for NeurIPS or would be more appropriate for a specialised theory conference. The main objections against accepting are (i) the results are not surprising, (ii) the assumptions (monotonicity and uniform distribution) are strong and (iii) the overall computational complexity is high.
Estimating decision tree learnability with polylogarithmic sample complexity
We show that top-down decision tree learning heuristics (such as ID3, C4.5, and CART) are amenable to highly efficient {\sl learnability estimation}: for monotone target functions, the error of the decision tree hypothesis constructed by these heuristics can be estimated with {\sl polylogarithmically} many labeled examples, exponentially smaller than the number necessary to run these heuristics, and indeed, exponentially smaller than information-theoretic minimum required to learn a good decision tree. This adds to a small but growing list of fundamental learning algorithms that have been shown to be amenable to learnability estimation. En route to this result, we design and analyze sample-efficient {\sl minibatch} versions of top-down decision tree learning heuristics and show that they achieve the same provable guarantees as the full-batch versions. We further give active local'' versions of these heuristics: given a test point x \star, we show how the label T(x \star) of the decision tree hypothesis T can be computed with polylogarithmically many labeled examples, exponentially smaller than the number necessary to learn T .